Identifying faults in system data

ABSTRACT

A method ( 200 ) for identifying a fault in data representing a target variable of a system is disclosed. The system comprises a plurality of variables and each variable is represented by a data stream. The method comprises obtaining a reference data set for a set of variables in the system including the target variable ( 202 ), obtaining an operational data set for the set of variables in the system including the target variable ( 204 ) and, for each of the reference and operational data sets, constructing an adjacency matrix between the target variable and the other variables in the set of variables ( 208 ), wherein the adjacency matrix is constructed on the basis of a metric calculated between the target variable and the other variables of the set ( 208   a ). The method further comprises calculating a difference matrix between the adjacency matrices for the reference and operational data sets ( 210 ), and determining whether the data representing the target variable in the operational data set includes a fault on the basis of a comparison between the calculated difference matrix and a fault threshold ( 212 ).

TECHNICAL FIELD

The present disclosure relates to a method for identifying a fault indata representing a target variable of a system. The present disclosurealso relates to a controller and to a computer program and a computerprogram product configured, when run on a computer to carry out a methodfor identifying a fault in data representing a target variable of asystem.

BACKGROUND

The “Internet of Things” (IoT) refers to devices enabled forcommunication network connectivity, so that these devices may beremotely managed, and data collected or required by the devices may beexchanged between individual devices and between devices and applicationservers. Commercial, industrial and other systems are increasinglymonitored and controlled using such connected devices, which may includesensors, actuators or other devices. Such devices typically havespecified operating conditions, including ranges for temperature,pressure etc. within which the device will operate reliably. Forexample, pressure sensors tend to give good results only in thetemperature range of −20° C. to +50° C. (Applied Measure Limited, 2018,https://appmeas.co.uk/resources/pressure-measurement-notes/how-does-temperature-affect-pressure-sensors/).If the ambient temperature around a sensor varies from this specifiedrange, it is possible that the pressure readings provided by the sensorwill contain errors. Certain types of faults in pressure sensor data maytherefore be identified by carefully analysing the pressure sensor datain the context of corresponding temperature data.

In an operational deployment, it may be necessary to monitor a largenumber of system variables, giving rise to the possibility of a widerange of potential sensor interferences that may cause faults in thesystem data. In order to identify any underlying system issues, and takeappropriate action, it is first important to identify faults in the dataon the basis of which the system is assessed. Faults in the data willoften be caused as a result of variations in other monitored variablesor environmental factors, which themselves may need to be addressed.

Faults in system data can generally be categorised as either (i)anomalies or (ii) other, non-anomalous data faults. FIG. 1 is a sampleplot of monitoring data for a variable in which faults of bothcategories are represented.

Anomalies are generally fluctuations in the signal such as spikes,unwanted amplitudes etc. An example of anomaly is shown on the right ofFIG. 1. Identification of anomalies is facilitated by the fact that theycan generally be relatively easily differentiated from the non-faultydata. Anomaly detection is a relatively well established class of faultdetection.

Other data faults generally resemble the operational signal of correctdata. An example of a fault of this category is illustrated on the leftof FIG. 1. These faults closely resemble the normal variations of thecorrect data signal and so are very difficult to identify. A. Sharma et.al. (A. Sharma, L. Golubchik and R. Govindan (2007). On the Prevalenceof Sensor Faults in Real-World Deployments, 4th Annual IEEECommunications Society Conference on Sensor, Mesh and Ad HocCommunications and Networks, San Diego, Calif., 213-222) propose amethod for identifying non-anomalous data faults using a clusteringapproach. A problem with this approach is that the clusteringapplications need some input arguments such as a number of clusters tobe passed. In addition the performance of the method depends on thedistance metric used. In another approach, Repaa et. al. (Reppa, Vasso,Marios M. Polycarpou, and Christos G. Panayiotou (2016). Sensor faultdiagnosis, Foundations and Trends® in Systems and Control, 1-248)categorise faults based on a classification approach. In the methodproposed by Reppa et al., the algorithm is sensitive to the shape offaults in the data and accuracy is highly dependent on the extent of thenoise in the data.

SUMMARY

It is an aim of the present disclosure to provide a method, apparatusand computer readable medium which at least partially address one ormore of the challenges discussed above.

According to a first aspect of the present disclosure, there is provideda method for identifying a fault in data representing a target variableof a system, wherein the system comprises a plurality of variables, andwherein each variable is represented by a data stream. The methodcomprises obtaining a reference data set for a set of variables in thesystem including the target variable and obtaining an operational dataset for the set of variables in the system including the targetvariable. The method further comprises, for each of the reference andoperational data sets, constructing an adjacency matrix between thetarget variable and the other variables in the set of variables, whereinthe adjacency matrix is constructed on the basis of a metric calculatedbetween the target variable and the other variables of the set. Themethod further comprises calculating a difference matrix between theadjacency matrices for the reference and operational data sets, anddetermining whether the data representing the target variable in theoperational data set includes a fault on the basis of a comparisonbetween the calculated difference matrix and a fault threshold.

According to examples of the present disclosure, the method may furthercomprise an initial step of selecting a target variable.

According to examples of the present disclosure, a reference data setmay comprise a data set of data collected at installation of theelements generating the data, or a data set collected at any other timewhen expectation of errors in the data is low. According to examples ofthe present disclosure, an operational data set may comprise live datafrom the system and may comprise the most recently available live datafrom the system.

According to examples of the present disclosure, the metric may comprisea combination of conditional correlation and conditional mutualinformation between the target variable and the other variables in theset. According to examples of the present disclosure, the correlationbetween the target variable and another variable may be conditioned onall other variables in the set of variables.

According to examples of the present disclosure, the metric may comprisea weighted sum of conditional correlation and conditional mutualinformation between the target variable and the other variables in theset.

According to examples of the present disclosure, conditional correlationbetween a target variable X and another variable Y may be calculated byiteratively solving the following formula:

$\begin{matrix}{{\rho\lbrack k\rbrack} = \frac{{\sigma\lbrack k\rbrack} - {\sum_{l = 1}^{k - 1}{{\rho\lbrack l\rbrack}{\sigma\left\lbrack {k - l} \right\rbrack}}}}{1 - {\sum_{l = 1}^{k - 1}{{\rho\lbrack l\rbrack}{\sigma\left\lbrack {k - l} \right\rbrack}}}}} & \;\end{matrix}$

where: σ[k] is the value of the correlation between X and Y obtained atlag k using the equation:

$\sigma_{xy} = \frac{E\left( {\left( {X - \mu_{x}} \right)\left( {Y - \mu_{y}} \right)} \right)}{\sigma_{x}\sigma_{y}}$

where: σ_(x),σ_(y) are the standard deviation of the variables X and Y,and

-   -   μ_(x),μ_(y) are the mean of the variables X and Y.

According to examples of the present disclosure, conditional mutualinformation between a target variable X and another variable Yconditional upon a third variable Z may calculated using the followingformula:

${I\left( {X;\left. Y \middle| Z \right.} \right)} = {\sum\limits_{z \in Z}{\sum_{y \in Y}{\sum\limits_{x \in X}{{p_{X,Y,Z}\left( {x,y,z} \right)}\log\frac{{p_{Z}(z)}{p_{X,Y,Z}\left( {x,y,z} \right)}}{{p_{X,Z}\left( {x,z} \right)}{p_{Y,Z}\left( {y,z} \right)}}}}}}$

where: p_(Z)[z] is the probability mass function of variable Z, and

-   -   p_(X,Y,Z)[x, y, z] is the joint probability mass function of        variables X, Y, Z

According to examples of the present disclosure, constructing theadjacency matrices may comprise using values of weights for the weightedsum that are at least one of default values, values selected on thebasis of a hypothesis as to the relative importance of conditionalcorrelation and conditional mutual information for the target variableand/or values based on an optimisation calculation.

According to examples of the present disclosure, the optimisationcalculation may be a previously performed optimisation calculation, asdiscussed in further detail below.

According to examples of the present disclosure, calculating adifference matrix may comprise subtracting the adjacency matrix for theoperational data set from the adjacency matrix for the reference dataset.

According to examples of the present disclosure, determining whether thedata representing the target variable in the operational data setincludes a fault on the basis of a comparison between the calculateddifference matrix and a fault threshold may comprise performing acomparison between the difference matrix and a fault threshold, and, ifthe difference matrix does not exceed the fault threshold, determiningthat the data representing the target variable in the operational dataset does not include a fault.

According to examples of the present disclosure, determining whether thedata representing the target variable in the operational data setincludes a fault on the basis of a comparison between the calculateddifference matrix and a fault threshold may further comprise, if thedifference matrix exceeds the fault threshold, determining that the datarepresenting the target variable in the operational data set includes afault.

According to examples of the present disclosure, determining whether thedata representing the target variable in the operational data setincludes a fault on the basis of a comparison between the calculateddifference matrix and a fault threshold may further comprise, if thedifference matrix exceeds the fault threshold, performing anoptimisation of the values of the weights for the weighted sum andconstructing an updated adjacency matrix for each of the reference andoperational data sets; wherein the updated adjacency matrices areconstructed on the basis of a metric calculated using the optimisedweight values. The determining step may further comprise recalculatingthe difference matrix on the basis of the updated adjacency matrices forthe reference and operational data sets, performing a comparison betweenthe recalculated difference matrix and the fault threshold, and, if therecalculated difference matrix does not exceed the fault threshold,determining that the data representing the target variable in theoperational data set does not include a fault.

According to examples of the present disclosure, the method may furthercomprise, if the recalculated difference matrix exceeds the faultthreshold, determining that the data representing the target variable inthe operational data set includes a fault.

According to examples of the present disclosure, the fault threshold maycomprise a value, and performing a comparison between a differencematrix and the fault threshold may comprise comparing each entry in thedifference matrix to the value of the fault threshold, and thedifference matrix may exceed the fault threshold if at least one entryin the difference matrix exceeds the value of the fault threshold.

According to examples of the present disclosure, the method may furthercomprise, if an entry in the difference matrix exceeds the value of thefault threshold, determining that the data representing the targetvariable in the operational data set includes a fault, and that thesource of the fault in the data is the variable corresponding to theentry in the difference matrix that exceeds the threshold value.

According to examples of the present disclosure, the method may furthercomprise, if every entry in the difference matrix exceeds the value ofthe fault threshold, determining that the data representing the targetvariable in the operational data set includes a fault, and that thesource of the fault in the data is the target variable.

According to examples of the present disclosure, the fault threshold maybe selected to account for expected statistical variation in the data.

According to examples of the present disclosure, constructing anadjacency matrix between the target variable and the other variables inthe set of variables may comprises filtering the other variables in theset of variables according to the value of the metric calculated betweenthe target variable and the other variables of the set, and including inthe adjacency matrix those other variables of the set of variables thathave a value of the calculated metric above an inclusion threshold.

According to examples of the present disclosure, performing anoptimisation of the values of the weights for the weighted sum maycomprise obtaining a plurality of operational data sets for the set ofvariables in the system including the target variable, the plurality ofoperational data sets including data for the set of variables atdifferent times during operation of the system and constructing anadjacency matrix between the target variable and the other variables inthe set of variables for each of the plurality of operational data sets.Performing an optimisation may further comprise, for each of theplurality of operational data sets, calculating a difference matrixbetween the adjacency matrices for the reference and operational datasets, and identifying values for the weights for the weighted sum thatminimise the sum, over all of the operational data sets, of the sum ofall entries in each difference matrix.

According to examples of the present disclosure, identifying values ofthe weights for the weighted sum that minimise the sum, over all of theoperational data sets, of the sum of all entries in each differencematrix may comprise solving the optimisation problem:

$\begin{matrix}\min \\{w_{1},w_{2}}\end{matrix}{\sum\limits_{i = 1}^{N_{s}}{\sum{\delta_{i}\mspace{14mu}{such}\mspace{14mu}{that}\mspace{14mu}\begin{matrix}{0 \leq w_{1} \leq 1} \\{0 \leq w_{2} \leq 1} \\{{w_{1} + w_{2}} = 1}\end{matrix}}}}$

where: N_(s) is the number of operational data sets;

-   -   δ_(i) is the difference matrix for operational data set i; and    -   w₁ and w₂ are the weights of the weighted sum.

According to examples of the present disclosure, the weight values forconstruction of the initial adjacency matrices in the optimisationproblem may be those used in the adjacency matrices for the firstcomparison, that is, according to different examples, default values,values based on a hypothesis values based on a previous optimisation.

According to examples of the present disclosure, example time intervalsfor the plurality of operational data sets may include every 2 minutes,5, minutes, 10 minutes etc.

According to examples of the present disclosure, the method may furthercomprise, if it is determined that the data representing the targetvariable in the operational data set includes a fault, repeating thesteps of the method for operational data sets at different timeinstances to identify the time instance at which the difference matrixfirst exceeds the fault threshold.

According to examples of the present disclosure, the system may comprisean Internet of Things (IoT) system.

According to examples of the present disclosure, the variables maycomprise sensor measurements.

According to examples of the present disclosure, the method may furthercomprise selecting a new target variable, and repeating the steps of themethod for the new target variable.

According to examples of the present disclosure, in the event of asingle entry in the difference matrix exceeding the threshold value, thevariable corresponding to that entry may be selected as the next targetvariable.

According to examples of the present disclosure, the method may furthercomprise obtaining an updated operational data set, and repeating thesteps of the method with the updated operational data set.

According to examples of the present disclosure, the method may furthercomprise triggering an alarm if a fault is detected in the data, and/ortriggering some remedial action to address a source or cause of thefault.

According to another aspect of the present disclosure, there is provideda computer program comprising instructions which, when executed on atleast one processor, cause the at least one processor to carry out amethod according to any one of the preceding aspects or examples of thepresent disclosure.

According to another aspect of the present disclosure, there is provideda carrier containing a computer program according to the precedingaspect of the present disclosure, wherein the carrier comprises one ofan electronic signal, optical signal, radio signal or computer readablestorage medium.

According to another aspect of the present disclosure, there is provideda computer program product comprising non transitory computer readablemedia having stored thereon a computer program according to a precedingaspect of the present disclosure.

According to another aspect of the present disclosure, there is provideda controller for identifying a fault in data representing a targetvariable of a system, wherein the system comprises a plurality ofvariables, and wherein each variable is represented by a data stream.The controller comprises a processor and a memory, the memory containinginstructions executable by the processor such that the controller isoperable to obtain a reference data set for a set of variables in thesystem including the target variable, obtain an operational data set forthe set of variables in the system including the target variable, and,for each of the reference and operational data sets, construct anadjacency matrix between the target variable and the other variables inthe set of variables, wherein the adjacency matrix is constructed on thebasis of a metric calculated between the target variable and the othervariables of the set. The controller is further operable to calculate adifference matrix between the adjacency matrices for the reference andoperational data sets, and determine whether the data representing thetarget variable in the operational data set includes a fault on thebasis of a comparison between the calculated difference matrix and afault threshold.

According to examples of the present disclosure, the controller isfurther operable to carry out a method according to any one of thepreceding aspects or examples of the present disclosure.

According to another aspect of the present disclosure, there is provideda controller for identifying a fault in data representing a targetvariable of a system, wherein the system comprises a plurality ofvariables, and wherein each variable is represented by a data stream.The controller is adapted obtain a reference data set for a set ofvariables in the system including the target variable, obtain anoperational data set for the set of variables in the system includingthe target variable and, for each of the reference and operational datasets, construct an adjacency matrix between the target variable and theother variables in the set of variables, wherein the adjacency matrix isconstructed on the basis of a metric calculated between the targetvariable and the other variables of the set. The controller is furtheradapted to calculate a difference matrix between the adjacency matricesfor the reference and operational data sets, and determine whether thedata representing the target variable in the operational data setincludes a fault on the basis of a comparison between the calculateddifference matrix and a fault threshold.

According to examples of the present disclosure, the controller isfurther operable to carry out a method according to any one of thepreceding aspects or examples of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the present invention, and to show moreclearly how it may be carried into effect, reference will now be made,by way of example, to the following drawings, in which:

FIG. 1 is a sample plot of monitoring data for a variable in whichfaults of different categories are represented;

FIG. 2 is a flow chart illustrating process steps in a method foridentifying a fault in data representing a target variable of a system;

FIGS. 3a to 3d show a flow chart illustrating process steps in anotherexample of method for identifying a fault in data representing a targetvariable of a system;

FIG. 4 illustrates an example undirected graph for a target variable;

FIG. 5 illustrates simulated temperature data for a boiler;

FIG. 6 illustrates an example undirected graph for boiler temperatureand other variables;

FIG. 7 illustrates another example undirected graph for boilertemperature and other variables, constructed using a different metric;

FIG. 8 illustrates the data of FIG. 5 with an introduced fault;

FIG. 9 illustrates another example undirected graph for boilertemperature and other variables, constructed using an operational dataset including the faulty data of FIG. 8;

FIG. 10 is a system flow diagram summarizing an implementation of amethod for identifying a fault in data representing a target variable ofa system;

FIG. 11 is a block diagram illustrating functional units in acontroller; and

FIG. 12 is a block diagram illustrating functional units in anotherexample of controller.

DETAILED DESCRIPTION

Aspects of the present disclosure provide a method according to which afault in system data is identified on the basis of a change independency between a target variable and other variables in the system.Thus instead of attempting to identify faults in a data stream on thebasis of the data stream alone, aspects of the present disclosureconsider how the interdependency of the data stream with data streamsrepresenting other system variables evolves over time. The evolutionover time in the interdependency of variables is examined though twodata sets: a first data set comprising data recorded during a phase inwhich the expectation of errors in the data is low, and a second dataset recorded during an operational phase of the system. The first dataset forms a reference data set, and may for example be collected duringan installation phase. Adjacency matrices are then constructed for eachof the data sets, based on a metric calculated between a target variablein the system and other variables in the system represented by each dataset. Any difference between the adjacency matrices is then examined asan indication of a potential fault in the data.

FIG. 2 is a flow chart illustrating process steps in a method 200 foridentifying a fault in data representing a target variable of a system,wherein the system comprises a plurality of variables, and wherein eachvariable is represented by a data stream. The system may for example bean Internet of Things (IoT) system comprising a plurality of devices,each device providing a data stream representing a system variable. Themethod may for example be performed by a controller. The controller maybe deployed on any node with access to data from the system. Withreference to FIG. 2, the method 200 comprises, in a first step 202,obtaining a reference data set for a set of variables in the systemincluding the target variable. The method further comprises obtaining anoperational data set for the set of variables in the system includingthe target variable at step 204. In step 208, the method 200 comprises,for each of the reference and operational data sets, constructing anadjacency matrix between the target variable and the other variables inthe set of variables. As illustrated at step 208 a, the adjacency matrixis constructed on the basis of a metric calculated between the targetvariable and the other variables of the set. The method 200 furthercomprises, at step 210, calculating a difference matrix between theadjacency matrices for the reference and operational data sets, and, instep 212, determining whether the data representing the target variablein the operational data set includes a fault on the basis of acomparison between the calculated difference matrix and a faultthreshold.

FIGS. 3a to 3c show a flow chart illustrating process steps in anotherexample of a method 300 for identifying a fault in data representing atarget variable of a system, wherein the system comprises a plurality ofvariables, and wherein each variable is represented by a data stream.The steps of the method 300 illustrate one way in which the steps of themethod 200 may be implemented and supplemented in order to achieve theabove discussed and additional functionality. As for the method 200 ofFIG. 2 above, the method 300 may be performed by a controller. Thecontroller may be deployed on any node with access to data from thesystem.

Referring to FIG. 3a , in a first step 302, the controller obtains areference data set of a set of variables in the system including thetarget variable. The set of variables may include all variables in thesystem or may exclude one or more variables in the system. As discussed,the reference data set may comprise data recorded during a phase inwhich the expectation of errors in the data is low. This may for examplecomprise an installation phase of the devises, such as sensors oractuators, supplying the data. In in other examples, the reference datamay be obtained during a different phase in which the expectation oferrors in the data is low, such as following a debugging operation etc.In step 304, the controller obtains an operational data set for the setof variables in the system including the target variable. Theoperational data set may for example comprise the most recent liveoperating data from the system, and may be updated periodically or in ascheduled manner, such as for example every 2, 5 or 10 minutes or atcertain specific times. The operational data set may or may not containfaults or errors in the data.

In step 306, the controller selects a target variable for investigation.The target variable may be selected at random, or on a sequential basisin which all variables in the system are selected one after the other,or on the basis of insight obtained from system analysis or from anearlier iteration of the method (as discussed below with reference forexample to step 342). In some examples, the selection of a targetvariable may be performed before the reference and operational data setsare obtained, and the variables to be included in the reference andoperational data sets may be selected on the basis of the selectedtarget variable. For example, any variables which may potentially impactthe selected target variable may be included in the set of variables forthe reference and operational data sets. In step 308, the controllerconstructs an adjacency matrix between the target variable and the othervariables in the set of variables for each of the reference andoperational data sets. As illustrated at 308 a, the adjacency matricesare constructed on the basis of a metric calculated between the targetvariable and the other variables of the set.

The adjacency matrices may thus quantify the dependencies between thevariables. The metric comprises a combination of the conditionalcorrelation and conditional mutual information between the targetvariable and the other variables in the set. In the illustrated example,the combination is a weighted sum:

M=w ₁(M.C)+w ₂(C.C)

Where: M is the value of the metric

-   -   M.C is the conditional mutual information    -   C.C is the conditional correlation, and    -   w1 and w2 are weights used for the weighted sum.

The values of the weights w1, w2 may be values based on a hypothesisabout the relative importance of conditional correlation and conditionalmutual information to the target variable, or may be based on apreviously performed optimisation (as discussed in further detail withreference to FIG. 3d ). In some examples, the values of the weights w1,w2 may be default values. For example default values of w1=1 and w2=1may be selected to calculate a simple sum of the conditional correlationand conditional mutual information. In some example implementations ofthe method, the values of the weights may be selected according to theparticular application in which the faults are to be detected, oraccording to the nature of the target variable. For example, in somesystems, conditional correlation may more accurately reflect theinterdependencies between variables than conditional mutual information,and in other systems the reverse may be true. The details of calculatingconditional correlation and conditional mutual information, includingexample equations, are provided below, following the present discussionof FIGS. 3a to 3 c.

As illustrated at 308 c, the controller may filter the other variablesin the set of variables according to the value of the metric calculatedbetween the target variable and the other variables, before constructingthe adjacency matrices. The adjacency matrices may be constructed on thebasis of variables having a value of the metric calculated with thetarget variable over an inclusion threshold. The inclusion threshold maybe selected in any appropriate manner, and may for example be expressedas a maximum number of variables, such that the variables associatedwith the X highest calculated metric values are included in theadjacency matrix, or may be a metric value, such that all variableshaving a metric value over the threshold are included in the adjacencymatrix. Other examples for calculating and representing the inclusionthreshold may be envisaged.

The adjacency matrices may describe an undirected graph of the variablesin the set, in which each variable comprises a node and the edgesbetween the variables are weighted according to the value of thecalculated metric.

Referring still to FIG. 3a , in step 310, the controller calculates adifference matrix between the adjacency matrices for the reference andoperational data sets. This may comprise subtracting the operationaldata set matrix from the reference data set matrix. Thus for a referenceadjacency matrix A1 and an operational adjacency matrix A2, thedifference matrix δ is calculated as:

δ=A ₁ −A ₂

If M.C and C.C are the matrices of values of conditional mutualinformation and conditional correlation for the data sets, thedifference matrix δ may be expressed as:

δ=w ₁(M.C)₁ +w ₂(C.C)₁ −w ₁(M.C)₂ −w ₂(C.C)₂

In step 314, the controller performs a comparison between the differencematrix and a fault threshold. This comparison is then used to determinewhether the data representing the target variable includes a fault, asset out in the following method steps.

As illustrated in 314 a, a value of the fault threshold may be selectedto account for expected statistical variation in the data. For example,it may be expected that the dependency between the variables may vary asmall amount, purely on the basis of expected statistical variation inthe data. The fault threshold may be selected such that this expectedvariation does not cause the difference metric to exceed the faultthreshold. A bootstrapping technique may be used to compute thethreshold. An example bootstrapping technique involves generation ofartificial samples from the existing data by changing measurement errorsin the data. As illustrated at 314 b, performing a comparison maycomprise comparing each entry in the difference matrix to the value ofthe fault threshold. The difference matrix is considered to exceed thefault threshold if at least one entry in the difference matrix exceedsthe value of the fault threshold.

In step 316, the controller checks whether or not the difference matrixexceeds the threshold. If the difference matrix does not exceed thethreshold, then the controller concludes that there is no fault in thedata representing the target variable, and the controller proceeds tostep 318 of the method. Step 318 is described in further detail below,and comprises checking whether all variables in the system have beenconsidered, allowing for, if appropriate, the selection of a new targetvariable and the performance of the method to investigate thepossibility of faults in the data of the newly selected target variable.If the difference matrix exceeds the threshold, the controller maydetermine in step 320 that there is a fault in the data, and may takeappropriate action, as discussed below in steps 340 to 350. In otherexamples, the controller may first establish that the difference is notcaused by a sub-optimal choice of weights for use in the weightedcombination metric, as set out below.

In step 322, the controller performs an optimization of the values ofthe weights for the weighted sum that is calculated as the metric forconstructing the adjacency matrices. The details of this optimizationprocedure are discussed below, with reference to FIG. 3d . The result ofthe optimization procedure is new values of the weights w1 and w2 foruse in calculating the metric between the target variable and the othervariables in the variable set. On the basis of these optimized weights,the controller then constructs, at step 324, an updated adjacency matrixfor each of the reference and operational data sets. In step 326, thecontroller then recalculates the difference matrix on the basis of theupdated adjacency matrices for the reference and operational data sets.In step 328, the controller performs a comparison between therecalculated difference matrix and the fault threshold. The faultthreshold may be the same threshold as was used in step 314 or may bedifferent. A similar bootstrapping procedure to that described above mayalso be used to select the fault threshold for step 328. As discussedabove with reference to step 314, performing a comparison may comprisecomparing each entry in the difference matrix to the value of the faultthreshold. The difference matrix is considered to exceed the faultthreshold if at least one entry in the difference matrix exceeds thevalue of the fault threshold.

If the controller determines at step 330 that the recalculateddifference matrix does not exceed the fault threshold, then the value orvalues in the original difference matrix that caused the matrix toexceed the threshold were a consequence of inappropriate weightingvalues, and the controller thus determines at step 332 that the datarepresenting the target variable in the operational data set does notinclude a fault. If the recalculated difference matrix exceeds the faultthreshold, then the value or values in the original difference matrixthat caused the matrix to exceed the threshold were a not consequence ofinappropriate weighting values, and the controller determines at step334 that the data representing the target variable in the operationaldata set includes a fault.

Referring now to FIG. 3c , the controller then proceeds to determine theorigin or cause of the fault in the data. If an entry in the adjacencymatrix is unchanged between the reference and operational datasets, thatis if the corresponding entry in the difference matrix is zero (orwithin a margin of statistical variation), then the relation between thetarget variable and the corresponding other or influencing variable isunchanged. It may therefore be inferred that the influencing variable isnot causing a change in the target variable. If an entry in theadjacency matrix changes between the reference and operational datasets,that is if the corresponding entry in the difference matrix is non zero(and above a margin of statistical variation), then the relation betweenthe target variable and the corresponding other or influencing variablehas changed. If only that relation has changed, then the change in therelation may be attributed to the effect of the influencing variable onthe target variable. If the relation of all the variables to the targetvariable changes, then the change can be attributed to a fault in thetarget variable.

In step 336, the controller checks whether or not every entry in thedifference matrix exceeds the fault threshold. If not every entry in thedifference matrix exceeds the threshold, then the controller determinesin step 338 that the source of the fault in the data is the variablecorresponding to the entry in the difference matrix that exceeds thethreshold value. In step 340, the controller repeats the steps of themethod for operational data sets at different time instances to identifythe time instance at which the difference matrix first exceeds the faultthreshold. This time instance represents the time at which theidentified variable began to affect the readings for the targetvariable. This insight may assist with further fault investigationand/or identifying correct data for the target variable on the basis ofwhich operational decisions may be made. The controller may theninvestigate the identified variable as a target variable in step 342.This step comprises returning to step 306 of the method and selectingthe identified variable as the target variable, before continuing withthe steps of the method as described. If the identified variable is thesource of the error in the original target variable data, then it ispossible that other variables may have been affected, and this may berepresented in the evolution of the dependencies between the identifiedvariable and the other variables of the system, as illustrated by adifference matrix constructed according to examples of the presentdisclosure. Alternatively, or in addition, the controller may trigger analarm or may directly trigger remedial measures on the basis of theidentified fault in the original target variable data and the identifiedvariable that is the source of the fault.

Returning to step 336, if the controller determines that every entry inthe difference matrix exceeds the value of the fault threshold, then thecontroller determines at step 346 that the source of the fault in thedata is the target variable. The controller then proceeds, in step 348,to repeat the steps of the method for operational data sets at differenttime instances to identify the time instance at which the differencematrix first exceeds the fault threshold. As discussed above withreference to step 340, this repetition allows the controller to identifythe time instance at which the error in the data begins and the targetvariable begins to potentially affect the validity of data for othervariables. For example if a temperature sensor has exceeded itsoperational threshold, its own readings may contain faults, but otherdevices may also have exceeded their operational thresholds fortemperature, meaning that their data may also be investigated todetermine whether or not errors appear in their data at around the sametime. In step 350, the controller may trigger an alarm or remedialmeasures as previously discussed.

In step 352, the controller checks whether all variables have beenconsidered. This may be a check on all variable sin the system, or allvariables from a set of variables for which the accuracy of the data isto be investigated. If all variables have not yet been considered, thecontroller returns to step 306 and selects and new target variablebefore executing the remaining method steps as described above. If allvariables have been considered, then the controller returns to step 306to obtain a newly updated operational data set and proceed with theselection of a target variable and the remaining method steps, sochecking for a fault in the system data at a new time increment.

The above described example methods 200 and 300 represent a robust, datadriven approach to the identification of faults that cannot beclassified as anomalies in system data. No input arguments are requiredfrom a user and the method is applicable to any kind of variable,including those that do not lend themselves to clustering or othercurrently used analysis techniques for non-anomalous data errors.

A detailed discussion of an example metric and its calculation is nowprovided. This discussion applies to the metric which is calculatedbetween the target variable and other variables in the system in orderto construct the adjacency matrices according to examples of the presentdisclosure. The construction of adjacency matrices as described above isanomalous to the construction of a graph, and a graph is used in thefollowing discussion to illustrate the example metric.

FIG. 4 illustrates an example undirected graph in which the central node402 is the target variable or variable of interest, and the other nodes404, 406, 408, 410, 412 are the variables influencing the variable ofinterest 402. The influencing nodes are connected to the node 402representing the target variable with edges which have varyingstrengths. As already discussed, a metric, which may comprise a weightedsum of conditional correlation and conditional mutual information, isused to compute the strength of the edges of the graph. The thick edgesbetween nodes 406 and 402 and between nodes 412 and 402 correspond tothe influencing nodes 406 and 412 which are connected most strongly tothe target node 402. The thinner edges connect nodes which are moreweakly connected to the target node. A strong connection between aninfluencing node and the target node indicates that the variablesrepresented by these nodes have a strong interdependency. Conversely, aweak connection between nodes indicates a weak interdependency.

After initial metric calculation and graph construction, the variablesinfluencing the central node may be filtered based on the values of theadjacency matrix. For example, if the strength is low, it can beinferred that the interdependency between the target variable and therelevant influencing variable is low. A threshold may be set such thatonly the nodes connected by the X strongest edges are retained, or onlynodes connected with an edge strength over a threshold value aremaintained. In this manner, a subset graph is obtained with fewer nodes,simplifying subsequent analysis and computation.

According to examples of the present disclosure, the metric used forconstruction of the graph is a weighted sum of conditional correlationand conditional mutual information. The importance of this metric isexplored below, followed by a detailed discussion of how it may becalculated.

An example system is proposed comprising three variables x, y, z and adata generating process of:

x=2z+e ₁

y=3z+e ₂

In this process, e₁, e₂ are white noise vectors and variables x and yare generated by the equations given above. In this example system, itis noted that the variables x and y are not related directly, but ratherare related by the variable z.

Data was generated for the variables x, y and z, correlation valuesσ_(xy), σ_(yz) and σ_(xz) were computed. Correlation gives an indicationof linear connections between variables. For this example system, thecorrelations values were computed as σ_(xy)=0.95, σ_(yz)=0.97 andσ_(xz)=0.98.

It will be appreciated that although the variables x and y are notconnected directly, the correlation value between x and y is high(0.95), suggesting that these variables are strongly connected.Correlation alone can thus lean to misleading results during graphconstruction as a high correlation value can be obtained even when twovariables are not connected directly.

The conditional correlation value between the variables x and y is muchlower than the standard correlation. The conditional correlation betweenx and y was calculated as 0.02. Using conditional correlation maytherefore provide a more accurate representation of interdependencies,and so a more accurate adjacency matrix and graph.

In one example of the present disclosure, it is proposed to usePearson's correlation to compute the estimate of the correlation. ThePearson's correlation between two variables X and Y is computed as

$\sigma_{xy} = \frac{E\left( {\left( {X - \mu_{x}} \right)\left( {Y - \mu_{y}} \right)} \right)}{\sigma_{x}\sigma_{y}}$

where σ_(x), σ_(y) are the standard deviation of the random variables Xand Y respectively and μ_(x), μ_(y) are the mean of the variables X andY respectively. As mentioned above, correlation has the disadvantage ofmodelling both direct and indirect linear dependencies. For the purposesof the metric used to calculate adjacency matrices in examples of thepresent disclosure, it is desirable to represent direct connectionbetween two variables. Examples of the present disclosure thereforecalculate the conditional correlation (or partial correlation) between atarget variable and another variable, quantifying only the direct lineardependencies rather than the total dependencies between the variables.The conditional or partial correlation computes the correlation betweentwo variables conditioned on all other variables, so measuring directlinear dependencies.

There are several ways of measuring partial correlation between twovariables. In examples of the present disclosure, it is proposed to usethe computation method disclosed in Ha, Min Jin, and Wei Sun. “PartialCorrelation Matrix Estimation Using Ridge Penalty Followed byThresholding and Reestimation.” Biometrics 70.3 (2014): 762-770. PMC.Web. 4 Apr. 2018. The expression used in this reference is a robustmeasure of partial correlation. The partial or conditional correlationmay therefore be calculated using the expression:

$\begin{matrix}{{\rho\lbrack k\rbrack} = \frac{{\sigma\lbrack k\rbrack} - {\sum_{l = 1}^{k - 1}{{\rho\lbrack l\rbrack}{\sigma\left\lbrack {k - l} \right\rbrack}}}}{1 - {\sum_{l = 1}^{k - 1}{{\rho\lbrack l\rbrack}{\sigma\left\lbrack {k - l} \right\rbrack}}}}} & \;\end{matrix}$

This formula is solved iteratively to obtain the conditional correlationat lag k. In this formula, σ[k] is the value of the correlation obtainedat lag k. The formula is used to calculate the conditional correlationbetween every pair of variables by calculating the correlation betweentwo variables.

Conditional correlation provides an indication of direct lineardependencies in the time domain. In many systems, the variables foranalysis may additionally be connected in a non-linear fashion. Mutualinformation is a measure used to quantify the non-linear dependenciesbetween variables, as set out in Cover, T. M., & Thomas, J. A. (2012).Elements of information theory. John Wiley & Sons. However, mutualinformation has the same disadvantage as correlation, in that is modelsboth direct and indirect dependencies. Examples of the presentdisclosure therefore propose to use conditional mutual information tomodel direct non-linear dependencies.

Mutual information between two variables works on the principle ofcomputing correlation on the probability distribution functions of thevariables, so estimating the non-linear dependencies between them. Theconditional mutual information between two random variables X and Yconditioned on another variable Z is computed as given in Cover andThomas, 2012, as:

${I\left( {X;\left. Y \middle| Z \right.} \right)} = {\sum\limits_{z \in Z}{\sum_{y \in Y}{\sum\limits_{x \in X}{{p_{X,Y,Z}\left( {x,y,z} \right)}\log\frac{{p_{Z}(z)}{p_{X,Y,Z}\left( {x,y,z} \right)}}{{p_{X,Z}\left( {x,z} \right)}{p_{Y,Z}\left( {y,z} \right)}}}}}}$

It will be appreciated that the calculation of conditional mutualinformation requires calculation of probability density function, whichcan be difficult to calculate. Conditional mutual information istherefore a less robust measure, as the value of conditional mutualinformation changes with small change of the estimated probabilitydensity function. It may be desirable to seek to correct for thepotential error in the calculation of conditional mutual information.

From the above discussion, it may be appreciated that conditionalcorrelation is calculated from measured readings and is therefore arobust measure, whereas the conditional mutual information is calculatedfrom an estimated probability density function (it may also be estimatedfrom data). Conditional mutual information is not therefore a robustmeasure. However, conditional mutual information may be of greaterusefulness, as it quantifies direct, non-linear dependencies, whereasconditional (partial) correlation is limited to direct lineardependencies. Examples of the present disclosure therefore propose tocombine the conditional correlation and conditional mutual informationin a weighted manner, so balancing the robustness offered by conditionalcorrelation with the non-linear dependencies quantified by conditionalmutual information.

Some examples of the present disclosure propose to use a weighted sum tocombine conditional correlation and conditional mutual information. Asdiscussed above, the initial weights for the sum may be selected to bedefault values (for example 1), or may be selected on the basis of somehypothesis or theory as to the relative importance of conditionalcorrelation and conditional mutual information to the particular targetvariable under consideration. In other examples, the weights may bechosen according to a previously performed optimization procedure.Weight optimization is performed according to some examples of thepresent disclosure in the event of an initial comparison between adifference matrix and a fault threshold that suggest the presence of afault in the data. A high value entry in a difference matrix, whichentry is above a fault threshold value, may be case by a fault in thedata, but it may also be caused by an inappropriate choice of weightsfor the weighted sum. The optimal weights for the weighted sum willdepend upon the details of the underlying system the data of which isbeing analyzed for fault detection. An optimization procedure allowsthis possibility to be discounted, ensuring that a difference matrixthat exceeds a fault threshold is truly indicative of a fault in thedata. Referring again to FIG. 3b , the optimization process is conductedaccording to the method 200 after an initial comparison indicating thata difference matrix exceeds a fault threshold.

FIG. 3d illustrates process steps that maybe carried out as a part ofthe optimization procedure. As a reminder, the metric used for theconstruction of the adjacency matrices according to the method 200 is aweighted sum of conditional mutual information and conditionalcorrelation:

M=w ₁(M.C)+w ₂(C.C)

Where M is the computed metric, M.C. is conditional mutual information,C.C. is the conditional correlation, and w₁, w₂ are the weights used inthe computation.

With reference to FIG. 3d , in a first step 322 a of the optimizationprocure, the controller obtains a plurality of operational data sets forthe set of variables in the system including the target variable, theplurality of operational data sets including data for the set ofvariables at different times during operation of the system. In step 322b, the controller constructs an adjacency matrix between the targetvariable and the other variables in the set of variables for each of theplurality of operational data sets. The value of the weights used tocompute the metric for the adjacency matrices may be some value between0 and 1 that is a default value, hypothesis value or a value from aprevious optimization, as discussed above. In step 322 c, the controllercalculates, for each of the plurality of operational data sets, adifference matrix between the adjacency matrices for the reference andoperational data sets, resulting in a plurality of difference matrices,one difference matrix for each of the operational data sets obtained.

As noted above, a difference matrix between a reference adjacency matrixA1 and an operational adjacency matrix A2 is calculated as:

δ=A ₁ −A ₂

δ=w ₁(M.C)₁ +w ₂(C.C)₁ −w ₁(M.C)₂ −w ₂(C.C)₂

Thus, for the plurality of Ns operational data sets, Ns differencematrices are obtained:

δ_(i) =w ₁(M.C)_(PD) +w ₂(C.C)_(PD) −w ₁(M.C)_(i) −w ₂(C.C)_(i) , i=1,2,. . . ,N _(s)

A difference matrix having entries over a fault threshold may beobtained as a consequence of a fault in the data or as a consequence ofusing non-optimal weights in the calculation of the metric used toconstruct the adjacency matrices. In order to minimize the effect of theweights on the difference matrices, an optimization problem may be usedto identify values for the weights for the weighted sum that minimizethe sum, over all of the operational data sets, of the sum of allentries in each difference matrix. This equates to attempting minimize,for different weight values, the deviation from the adjacency matrixobtained from the reference dataset. This optimization problem may beexpressed as:

$\begin{matrix}\min \\{w_{1},w_{2}}\end{matrix}{\sum\limits_{i = 1}^{N_{s}}{\sum{\delta_{i}\mspace{14mu}{such}\mspace{14mu}{that}\mspace{14mu}\begin{matrix}{0 \leq w_{1} \leq 1} \\{0 \leq w_{2} \leq 1} \\{{w_{1} + w_{2}} = 1}\end{matrix}}}}$

The constraints ensure that the weights obtained are between the limits0 to 1.

The result of the optimization problem is the optimal value of weightsw1 and w2 for the system and target variable under consideration. Asdiscussed above with reference to FIG. 3b and FIG. 3c , these optimalweights may be used to recalculate the adjacency matrices of the primaryand most recent secondary data sets, allowing for the calculation of anupdated difference matrix. If the difference matrix still exceeds thefault threshold, then it may be inferred that this is as a consequenceof a fault in the data, and not caused by the selection of inappropriatevalues for the weights w1 and w2.

An example implementation of a method according to the presentdisclosure is discussed below. In the example implementation, the methodis performed by a node which may be deployed anywhere in the system. Theonly requirement on the node deployment is that it should have a directconnection to data from the system. The node should be capable ofperforming the computations necessary to carry out the method. Data fromthe system is fed into the node and calculations to perform the methodare performed on the node itself, or may in some other examples beoutsources to a virtualized function or resource. Depending on theoutput of the node, an alarm may be triggered which will indicate to anend user the presence of a fault in the system data. The alarm can be inthe form of a message or messages, some action to move sensors such thatthe effect of the fault can be mitigated etc. Any update to the node canbe done on the fly to tune the parameters of the method.

The example implementation is performed don a synthetic data set inwhich the target variable is the temperature of water in a boiler. Othervariables in the system include outside temperature, pressure of thewater in the boiler and level of the water in the boiler. To demonstratethe performance of the example method in identifying dummy factors, theflow of water into another boiler is also considered as a potentialinfluencing variable. The simulated temperature of the boiler isillustrated in FIG. 5.

A reference data set is obtained comprising data values obtained at asimulate installation phase, and an undirected graph is constructedbetween the boiler temperature and other variables. The graph isconstructed on the basis of a metric comprising a weighted combinationof conditional correlation and conditional mutual information asdiscussed above. For this initial calculation, estimated weight valuesof w1=1 and w2=1 are used. The constructed graph is shown in FIG. 6.

The graph provides a visual representation of the adjacency matrix forthe primary dataset, with the thickness of the edges connecting nodesrepresenting the magnitude of the values in the adjacency matrix for themetric calculate between the connected nodes. It can be seen from theconstructed graph that the flow into the boiler 602 has only minimumeffect on the temperature 604, whereas the level of the water in theboiler 606 has maximum effect. The graph matches with the physics of agenuine boiler system: a boiler surface is generally highly insulatedand outside environmental temperature changes have reduced effect on theinside boiler temperature. In contrast, the level of the water in theboiler has direct influence on the temperature of the boiler.

For the sake of comparison, FIG. 7 illustrates a similar graphconstructed using a metric of conditional correlation only, as opposedto the weighted sum of conditional correlation and mutual informationthat was used to construct the graph of FIG. 6. It can be seen from FIG.7 that the external temperature 708 is indicated as having stronginfluence on the temperature of water in the boiler 704. This does notagree with the physics of the system, in which the insulation betweenthe boiler and the external environment means the external temperaturecan have only limited effect on the water temperature inside the boiler.The metric of a weighted sum or conditional correlation and conditionalmutual information, used to construct the graph of FIG. 6, thereforeprovides a more accurate representation of the system.

An artificial fault is then introduced in the temperature of the boilerbetween 21-40 seconds, as illustrated in FIG. 8. An undirected graph isconstructed using an operational data set including the faulty data. Thegraph is filtered to maintain only the most closely connected nodes andis illustrated in FIG. 9.

The difference between the two graphs is computed and the difference isanalyzed for the factors affecting the process. Updated weight values ofw1=0.7 and w2=0.3 are used. The strength of the connection between thetemperature of the boiler appears to decrease with the level of thewater in the boiler following the introduction of the faulty data. Afault may therefore be expected in the target variable of boiler watertemperature. In this example illustration the relevant value of theadjacency matrix, that is the value of the weighted sum metric, wascalculated as 1.45 in the reference data set and 0.67 in the operationaldata set. A fault threshold of 0.08 is used. An optimization process isthen performed as discussed above to determine optimal weights. Theoptimization process returns optimal weight values of w1=0.45 andw2=0.55.

The adjacency matrices for the reference and operational data sets arethen reconstructed using the optimal weights, and the difference matrixis calculated on the basis of the updated adjacency matrices. Therelevant value of the adjacency matrix, that is the value of theweighted sum metric, was calculated as 1.45 in the reference data setand 0.82 in the operational data set. A fault threshold of 0.06 is used,meaning the relevant entry in the difference matrix is above the faultthreshold and indicates a fault in the data. The exact time at which thefault occurs can be obtained by iteratively computing the operationaldata set adjacency matrix for different time instances. This iterationconcludes that the temperature in the boiler has a fault between thetimes 21 and 40 seconds, which matches with the original data. Thesource of the fault is assigned to the variable ‘level of the water inthe boiler’.

The above discussed procedure may be followed to investigate any of thevariables in the example system, and repeated iterations at differenttime instances for variables demonstrating a dependency change betweenthe reference and operational data set may allow for the identificationnot only of fault data but also the precise tie at which the faultoccurred. On the basis of the analysis, a likely source of the datafault may also be identified, allowing for the regeneration of an alarmand/or appropriate recommendations or actions to address the fault.

For the sake of comparison, the system is also analyzed using aconventional method set out in Reppa, Vasso, Marios M. Polycarpou, andChristos G. Panayiotou (2016). Sensor fault diagnosis, Foundations andTrends® in Systems and Control, 1-248. The analysis according to themethod set out in Reppa et al. fails to identify the artificiallyinserted fault in the data. Examples of the present disclosure thisprovide a more effective way of identifying non-anomalous fault sinsystem data. Examples of the present disclosure additionally allow forthe identification of a factor or variable likely to be the cause of thefault in the data, allowing for remedial action to the taken.

The computation complexity of examples of the present disclosure isdiscussed below. The adjacency matrix, or graph, for the reference dataset is computed once and stored in a database. An adjacency matrix orgraph is then computed for operational data sets at scheduled orperiodic time intervals, for example every 2, 5 or 10 minutes. Thiscalculation of adjacency matrix requires calculation of conditionalmutual information and conditional correlation. On a local machine ofi5-4^(th) gen with RAM of 8 GB, an adjacency matrix of size 6×6 takesapproximately 1 second to compute (as it have analytical expressions).An optimization problem for determining optimal weight values takesapproximately 2 seconds (linear programming) to compute the optimalweights. This can be done on normal raspberry pi which can be performedin the edge network, as the computation is not overly costly.Notifications may be sent only when faults are detected, or may be senton a periodic basis, confirming that a method for fault detection isbeing performed.

Example systems in which methods according to the present disclosure maybe implemented encompass a wide range of industrial, commercial andother systems, including factories, laboratories, manufacturing plants,power stations etc. Another example system in which the methodsaccording to the present disclosure may be implemented is a mine. ThePueblo Viejo mine is a gold mine located in the north-central region ofthe Dominican Republic in the Sánchez Ramírez Province. At Pueblo Viejo,the gold is extracted by injecting high-purity oxygen into autoclavesoperating at 230° C. and 40 bar of pressure. The resulting chemicalreactions oxides the sulfide minerals the gold is trapped within. Themine authorities use controllers such as a PID controller to control thetemperature and pressure within the mine. These controllers performcertain actuations based on sensor measurements of the temperature andpressure. Any faults in the temperature and pressure measurements canresult in incorrect actuations being performed, with potentiallycatastrophic consequences. The temperature in the mine is affected by arange of variables including as coolant flow, number of people in themine etc. Any fault in temperature data as a consequence of thetemperature or any of the other variables which may affect thetemperature measurements, can be identified online during operation ofthe mine using examples of the present disclosure, so allowing forcorrective measures to be taken to avoid a potential accident.

FIG. 10 is a system flow diagram summarizing an implementation ofmethods according to examples of the present disclosure. With referenceto FIG. 10, information from a system or plant 1002 and data obtainedduring an installation phase 1004 are obtained to allow for calculationof adjacency matrices for operational and reference data setrespectively. A difference matrix is calculated at 1006 and thedifference matrix is compared to a fault threshold at 1008. If thedifference matrix does not exceed the fault threshold, then the systemis operating correctly. If the difference matrix exceeds the threshold,then an optimization is performed at 1010 for the weights used incomputing the metric for constructing the adjacency matrices. An updateddifference matrix is then compared to the fault threshold at 1012. Ifthe difference matrix does not exceed the fault threshold, then thesystem is operating correctly, and the previous result was a consequenceof non-optimal weights. If the difference matrix exceeds the threshold,then a fault is deemed to be present in the data and an alarm istriggered.

As discussed above, the methods 200, 300 may be performed by acontroller, which may for example be a management node in an IoT system,and may be a physical node or a Virtualised Network Function. FIG. 11 isa block diagram illustrating an example controller 1100 which mayimplement the methods 200, 300 according to examples of the presentdisclosure, for example on receipt of suitable instructions from acomputer program 1150. Referring to FIG. 11, the controller 1100comprises a processor or processing circuitry 1102, a memory 1104 andinterfaces 1106. The memory 1104 contains instructions executable by theprocessor 1102 such that the controller 1100 is operative to conductsome or all of the steps of the method 200 and/or 300. The instructionsmay also include instructions for executing one or moretelecommunications and/or data communications protocols. Theinstructions may be stored in the form of the computer program 1150. Insome examples, the processor or processing circuitry 1102 may includeone or more microprocessors or microcontrollers, as well as otherdigital hardware, which may include digital signal processors (DSPs),special-purpose digital logic, etc. The processor or processingcircuitry 1102 may be implemented by any type of integrated circuit,such as an Application Specific Integrated Circuit (ASIC), FieldProgrammable Gate Array (FPGA) etc. The memory 1104 may include one orseveral types of memory suitable for the processor, such as read-onlymemory (ROM), random-access memory, cache memory, flash memory devices,optical storage devices, solid state disk, hard disk drive etc.

FIG. 12 illustrates functional units in another example of controller1200 which may execute examples of the methods 200, 300 of the presentdisclosure, for example according to computer readable instructionsreceived from a computer program. It will be understood that the unitsillustrated in FIG. 12 are functional units, and may be realised in anyappropriate combination of hardware and/or software. The units maycomprise one or more processors and may be integrated to any degree.

Referring to FIG. 12, the controller 1200 is for identifying a fault indata representing a target variable of a system, wherein the systemcomprises a plurality of variables, and wherein each variable isrepresented by a data stream. The controller 1200 comprises a datamodule 1202 for obtaining a reference data set for a set of variables inthe system including the target variable and for obtaining anoperational data set for the set of variables in the system includingthe target variable. The controller 1200 further comprises a graphmodule for constructing an adjacency matrix between the target variableand the other variables in the set of variables for each of thereference and operational data sets, wherein the adjacency matrix isconstructed on the basis of a metric calculated between the targetvariable and the other variables of the set. The controller 1200 furthercomprises a difference module for calculating a difference matrixbetween the adjacency matrices for the reference and operational datasets. The controller 1200 further comprises a fault module 1208 fordetermining whether the data representing the target variable in theoperational data set includes a fault on the basis of a comparisonbetween the calculated difference matrix and a fault threshold. Thecontroller 1200 further comprises interfaces 1210. The term module mayhave conventional meaning in the field of electronics, electricaldevices and/or electronic devices and may include, for example,electrical and/or electronic circuitry, devices, processors, processingcircuitry, memories, logic, solid state and/or discrete devices,computer programs or instructions for carrying out respective tasks,procedures, computations, outputs, and/or displaying functions, and soon, as such as those that are described in the present disclosure.

Examples of the present disclosure thus provide an efficient method fordetection of faults in system data, which method is particularlyeffective at detecting non-anomalous faults, which are generally mustmore difficult to identify. The identification of such faults may helpin improving decisions taken on the fly during operation.

In any industrial or commercial system variables may be present in thesystem and various sensors may be used to measure them. For example,there are some situations where both temperature and pressure are to bemonitored. In this case, the temperature readings can be affected bychange in pressure and vice-versa. As an example, it may be required toswitch on the heater inside a boiler whenever the temperature of theboiler falls below a threshold temperature. Any fault in the temperaturesensor can result in switching on the heater even if the truetemperature is higher than the threshold. Identifying faulty temperaturedata, and so avoiding an incorrect action being taken on the basis ofthis data, can ensure that faulty temperature data is identified beforethe heater is switched on, saving power consumption. For a large networkwith thousands of sensors, estimating the interaction between sensors ischallenging. Examples of the present disclosure propose a robust methodto estimate and represent the interaction between variables and on thisbasis identify faults in any of the variables.

According to examples of the present disclosure, two adjacency matricesare constructed, reflecting connections between variables on the basisof a metric which may comprise a combination of conditional correlationand conditional mutual information. A first adjacency matrix isconstructed for a reference data set obtained when expectation of faultsin the data is low (at installation or through performing a de-noisingexercise). A second adjacency matrix is constructed for an operationaldata set obtained during live operation of the system. A differencebetween the adjacency matrices is calculated to obtain a differencematrix S. On the basis of the difference matrix, an optimisation processmay be used to calculate optimal weights for a metric that is a weightedsum of conditional correlation and conditional mutual information. Oncethe optimal weights have been calculated, the difference matrix may beupdated and compared to a fault threshold to determine if the differencematrix indicates the presence of a fault in the data.

Examples of the present disclosure may offer one or more of thefollowing advantages:

A generalised fault identification method able to consider all factorsaffecting a target variable to provide alarm or recommendations onidentifying a fault.

A method capable of identifying different types of data faults in anunsupervised manner.

A robust metric for adjacency matrix construction which enable theidentification of faults which would otherwise not be identified, and isable to distinguish between direct an indirect dependencies betweenvariables.

The methods of the present disclosure may be implemented in hardware, oras software modules running on one or more processors. The methods mayalso be carried out according to the instructions of a computer program,and the present disclosure also provides a computer readable mediumhaving stored thereon a program for carrying out any of the methodsdescribed herein. A computer program embodying the disclosure may bestored on a computer readable medium, or it could, for example, be inthe form of a signal such as a downloadable data signal provided from anInternet website, or it could be in any other form.

It should be noted that the above-mentioned examples illustrate ratherthan limit the disclosure, and that those skilled in the art will beable to design many alternative embodiments without departing from thescope of the appended claims. The word “comprising” does not exclude thepresence of elements or steps other than those listed in a claim, “a” or“an” does not exclude a plurality, and a single processor or other unitmay fulfil the functions of several units recited in the claims. Anyreference signs in the claims shall not be construed so as to limittheir scope.

1. A method for identifying a fault in data representing a targetvariable of a system, wherein the system comprises a plurality ofvariables, and wherein each variable is represented by a data stream,the method comprising: obtaining a reference data set for a set ofvariables in the system including the target variable; obtaining anoperational data set for the set of variables in the system includingthe target variable; for each of the reference data set and theoperational data set: constructing an adjacency matrix between thetarget variable and other variables in the set of variables, wherein theadjacency matrix is constructed on the basis of a metric calculatedbetween the target variable and the other variables of the set;calculating a difference matrix between the adjacency matrices for thereference and operational data sets; and determining whether the datarepresenting the target variable in the operational data set includes afault on the basis of a comparison between the calculated differencematrix and a fault threshold.
 2. The method of claim 1, wherein themetric comprises a combination of conditional correlation andconditional mutual information between the target variable and the othervariables in the set.
 3. The method of claim 1, wherein the metriccomprises a weighted sum of conditional correlation and conditionalmutual information between the target variable and the other variablesin the set.
 4. The method of claim 2, wherein conditional correlationbetween the target variable X and another variable Y is calculated byiteratively solving the following formula: $\begin{matrix}{{\rho\lbrack k\rbrack} = \frac{{\sigma\lbrack k\rbrack} - {\sum_{l = 1}^{k - 1}{{\rho\lbrack l\rbrack}{\sigma\left\lbrack {k - l} \right\rbrack}}}}{1 - {\sum_{l = 1}^{k - 1}{{\rho\lbrack l\rbrack}{\sigma\left\lbrack {k - l} \right\rbrack}}}}} & \;\end{matrix}$ where: σ[k] is the value of the correlation between X andY obtained at lag k using the equation:$\sigma_{xy} = \frac{E\left( {\left( {X - \mu_{x}} \right)\left( {Y - \mu_{y}} \right)} \right)}{\sigma_{x}\sigma_{y}}$where: σ_(x),σ_(y) a are the standard deviation of the variables X andY, and μ_(x),μ_(y) are the mean of the variables X and Y.
 5. The methodof claim 2, wherein conditional mutual information between the targetvariable X and another variable Y conditional upon a third variable Z iscalculated using the following formula:${I\left( {X;\left. Y \middle| Z \right.} \right)} = {\sum\limits_{z \in Z}{\sum_{y \in Y}{\sum\limits_{x \in X}{{p_{X,Y,Z}\left( {x,y,z} \right)}\log\frac{{p_{Z}(z)}{p_{X,Y,Z}\left( {x,y,z} \right)}}{{p_{X,Z}\left( {x,z} \right)}{p_{Y,Z}\left( {y,z} \right)}}}}}}$where: p_(Z)[z] is the probability mass function of variable Z, andp_(X,Y,Z)[x, y, z] is the joint probability mass function of variablesX, Y, Z
 6. The method of claim 3, wherein constructing the adjacencymatrices comprises using values of weights for the weighted sum that areat least one of: default values; values selected on the basis of ahypothesis as to the relative importance of conditional correlation andconditional mutual information for the target variable; or values basedon an optimization calculation.
 7. The method of claim 1, whereincalculating a difference matrix comprises subtracting the adjacencymatrix for the operational data set from the adjacency matrix for thereference data set.
 8. The method of claim 1, wherein determiningwhether the data representing the target variable in the operationaldata set includes a fault on the basis of a comparison between thecalculated difference matrix and a fault threshold comprises: performinga comparison between the calculated difference matrix and the faultthreshold; and if the calculated difference matrix does not exceed thefault threshold, determining that the data representing the targetvariable in the operational data set does not include a fault.
 9. Themethod of claim 8, wherein determining whether the data representing thetarget variable in the operational data set includes a fault on thebasis of a comparison between the calculated difference matrix and afault threshold further comprises: if the calculated difference matrixexceeds the fault threshold, determining that the data representing thetarget variable in the operational data set includes a fault.
 10. Themethod of claim 8, wherein the metric comprises a weighted sum ofconditional correlation and conditional mutual information between thetarget variable and the other variables in the set, and determiningwhether the data representing the target variable in the operationaldata set includes a fault on the basis of a comparison between thecalculated difference matrix and a fault threshold further comprises: ifthe difference matrix exceeds the fault threshold: performing anoptimisation of the values of the weights for the weighted sum;constructing an updated adjacency matrix for each of the reference andoperational data sets; wherein the updated adjacency matrices areconstructed on the basis of a metric calculated using the optimisedweight values; recalculating the difference matrix on the basis of theupdated adjacency matrices for the reference and operational data sets;performing a comparison between the recalculated difference matrix andthe fault threshold; and if the recalculated difference matrix does notexceed the fault threshold, determining that the data representing thetarget variable in the operational data set does not include a fault.11. The method of claim 10, further comprising: if the recalculateddifference matrix exceeds the fault threshold, determining that the datarepresenting the target variable in the operational data set includes afault.
 12. The method of claim 8, wherein the fault threshold comprisesa value; wherein performing a comparison between a difference matrix andthe fault threshold comprises comparing each entry in the differencematrix to the value of the fault threshold; and wherein the differencematrix exceeds the fault threshold if at least one entry in thedifference matrix exceeds the value of the fault threshold.
 13. Themethod of claim 12, further comprising, if an entry in the differencematrix exceeds the value of the fault threshold: determining that thedata representing the target variable in the operational data setincludes a fault, and that the source of the fault in the data is thevariable corresponding to the entry in the difference matrix thatexceeds the threshold value.
 14. The method of claim 12, furthercomprising, if every entry in the difference matrix exceeds the value ofthe fault threshold: determining that the data representing the targetvariable in the operational data set includes a fault, and that thesource of the fault in the data is the target variable.
 15. The methodof claim 1, wherein the fault threshold is selected to account forexpected statistical variation in the data.
 16. The method of claim 1,wherein constructing an adjacency matrix between the target variable andthe other variables in the set of variables comprises: filtering theother variables in the set of variables according to the value of themetric calculated between the target variable and the other variables ofthe set; and including in the adjacency matrix those other variables ofthe set of variables that have a value of the calculated metric above aninclusion threshold.
 17. The method of claim 10, wherein performing anoptimization of the values of the weights for the weighted sumcomprises: obtaining a plurality of operational data sets for the set ofvariables in the system including the target variable, the plurality ofoperational data sets including data for the set of variables atdifferent times during operation of the system; constructing anadjacency matrix between the target variable and the other variables inthe set of variables for each of the plurality of operational data sets;for each of the plurality of operational data sets, calculating adifference matrix between the adjacency matrices for the reference andoperational data sets; and identifying values for the weights for theweighted sum that minimize the sum, over all of the operational datasets, of the sum of all entries in each difference matrix.
 18. Themethod of claim 17, wherein identifying values for of the weights forthe weighted sum that minimize the sum, over all of the operational datasets, of the sum of all entries in each difference matrix comprisessolving the optimization problem: $\begin{matrix}\min \\{w_{1},w_{2}}\end{matrix}{\sum\limits_{i = 1}^{N_{s}}{\sum{\delta_{i}\mspace{14mu}{such}\mspace{14mu}{that}\mspace{14mu}\begin{matrix}{0 \leq w_{1} \leq 1} \\{0 \leq w_{2} \leq 1} \\{{w_{1} + w_{2}} = 1}\end{matrix}}}}$ where: N_(s) is the number of operational data sets;δ_(i) is the difference matrix for operational data set i; and w₁ and w₂are the weights of the weighted sum.
 19. The method of claim 1, furthercomprising, if it is determined that the data representing the targetvariable in the operational data set includes a fault: repeating thesteps of the method for operational data sets at different timeinstances to identify the time instance at which the difference matrixfirst exceeds the fault threshold. 20-26. (canceled)
 27. A controllerfor identifying a fault in data representing a target variable of asystem, wherein the system comprises a plurality of variables, andwherein each variable is represented by a data stream, the controllercomprising a processor and a memory, the memory containing instructionsexecutable by the processor such that the controller is operable to:obtain a reference data set for a set of variables in the systemincluding the target variable; obtain an operational data set for theset of variables in the system including the target variable; for eachof the reference data set and the operational data set: construct anadjacency matrix between the target variable and other variables in theset of variables, wherein the adjacency matrix is constructed on thebasis of a metric calculated between the target variable and the othervariables of the set; calculate a difference matrix between theadjacency matrices for the reference and operational data sets; anddetermine whether the data representing the target variable in theoperational data set includes a fault on the basis of a comparisonbetween the calculated difference matrix and a fault threshold. 28-30.(canceled)